{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bccd47fc",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/AzureCosmosDBMongoDBvCoreDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Azure CosmosDB MongoDB Vector Store\n",
    "In this notebook we are going to show how to use Azure Cosmosdb Mongodb vCore to perform vector searches in LlamaIndex. We will create the embedding using Azure Open AI. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4f33fc9",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc10b892",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-embeddings-openai\n",
    "%pip install llama-index-vector-stores-azurecosmosmongo\n",
    "%pip install llama-index-llms-azure-openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "712daea5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d1c538",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "import openai\n",
    "from llama_index.llms.azure_openai import AzureOpenAI\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup Azure OpenAI\n",
    "The first step is to configure the models. They will be used to create embeddings for the documents loaded into the db and for llm completions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67b86621",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Set up the AzureOpenAI instance\n",
    "llm = AzureOpenAI(\n",
    "    model_name=os.getenv(\"OPENAI_MODEL_COMPLETION\"),\n",
    "    deployment_name=os.getenv(\"OPENAI_MODEL_COMPLETION\"),\n",
    "    api_base=os.getenv(\"OPENAI_API_BASE\"),\n",
    "    api_key=os.getenv(\"OPENAI_API_KEY\"),\n",
    "    api_type=os.getenv(\"OPENAI_API_TYPE\"),\n",
    "    api_version=os.getenv(\"OPENAI_API_VERSION\"),\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "# Set up the OpenAIEmbedding instance\n",
    "embed_model = OpenAIEmbedding(\n",
    "    model=os.getenv(\"OPENAI_MODEL_EMBEDDING\"),\n",
    "    deployment_name=os.getenv(\"OPENAI_DEPLOYMENT_EMBEDDING\"),\n",
    "    api_base=os.getenv(\"OPENAI_API_BASE\"),\n",
    "    api_key=os.getenv(\"OPENAI_API_KEY\"),\n",
    "    api_type=os.getenv(\"OPENAI_API_TYPE\"),\n",
    "    api_version=os.getenv(\"OPENAI_API_VERSION\"),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f3c2643",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Settings\n",
    "\n",
    "Settings.llm = llm\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eecf4bd5",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6df9fa89",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `data/paul_graham/` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: c432ff1c-61ea-4c91-bd89-62be29078e79\n"
     ]
    }
   ],
   "source": [
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "\n",
    "print(\"Document ID:\", documents[0].doc_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create the index\n",
    "Here we establish the connection to an Azure Cosmosdb mongodb vCore cluster and create an vector search index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8731da62",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymongo\n",
    "from llama_index.vector_stores.azurecosmosmongo import (\n",
    "    AzureCosmosDBMongoDBVectorSearch,\n",
    ")\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.core import StorageContext\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "connection_string = os.environ.get(\"AZURE_COSMOSDB_MONGODB_URI\")\n",
    "mongodb_client = pymongo.MongoClient(connection_string)\n",
    "store = AzureCosmosDBMongoDBVectorSearch(\n",
    "    mongodb_client=mongodb_client,\n",
    "    db_name=\"demo_vectordb\",\n",
    "    collection_name=\"paul_graham_essay\",\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "We can now ask questions using our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author love working on?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cf55bf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author loved working on multiple projects that were not their thesis while in grad school,\n",
      "including Lisp hacking and writing On Lisp. They eventually wrote a dissertation on applications of\n",
      "continuations in just 5 weeks to graduate. Afterward, they applied to art schools and were accepted\n",
      "into the BFA program at RISD.\n"
     ]
    }
   ],
   "source": [
    "import textwrap\n",
    "\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did he/she do in summer of 2016?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdf5287f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The person moved to England with their family in the summer of 2016.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
